Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Import speedup - Ticket #2181 #334

Closed
wants to merge 9 commits into from
Closed

Conversation

scottza
Copy link
Contributor

@scottza scottza commented Jul 10, 2012

Applies the patch submitted at http://projects.scipy.org/numpy/ticket/2181

see mailing list discussion:

http://mail.scipy.org/pipermail/numpy-discussion/2012-July/063130.html

especially - http://mail.scipy.org/pipermail/numpy-discussion/2012-July/063194.html - should polytemplate.py be run at build time instead of requiring it to be run manually when edited.

adalke and others added 2 commits July 10, 2012 11:02
Currently there are 5 places under numpy/polynomial which do
something like:

  exec polytemplate.substitute(name='Hermite', nick='herm', domain='[-1,1]')

I have changed those so there are start/end markers, like

  ...

I have edited  polytemplate.py so that it implements a __main__
which finds those block markers and replaces the content with
the appropriate "polytemplate" substitution.

This means that if you edit the template then you will need
to manually run 'polytemplate.py' so that the dependent files
are rebuilt based on the new template.

The performance of

  python -c 'import time; t1=time.time(); import numpy; print time.time()-t1'

goes from 0.079 seconds to 0.057 seconds (best of 10), for
an import speedup of ~25%.
@travisbot
Copy link

This pull request fails (merged d59b142 into 436a28f).

@scottza
Copy link
Contributor Author

scottza commented Jul 10, 2012

By the way, all tests pass on my 64-bit Ubuntu machine for py24 through py32 using tox. Looks like we got the bad Travis worker machine..

@rgommers
Copy link
Member

Yes, that's an unrelated failure.

Not requiring a manual action would indeed be better I think. But I'm okay with whatever option @charris prefers.

@charris
Copy link
Member

charris commented Jul 10, 2012

Looks like a lot of this is a style cleanup that should probably come in as a separate pull request.

On the template, I have mixed feelings. The speed up isn't large and the added amount of code is significant. That said, I was never really happy with the template as is. And I'm not really happy with the current proposal either ;) It would be ideal if the files could be rewritten during the build phase, but that begins to become complicated... I'd like to think about possible solutions a bit more. Any other ideas? Note the main problem I had was the type specific documentation, otherwise the template could have been implemented as a mixin class.

@rgommers
Copy link
Member

Rewriting at build time doesn't seem all that complicated; it's what I had in mind in the first place. Whether to do it like in this PR or by simply doing the current exec line at build time is a matter of preference I think. The latter doesn't increase code size at all, besides a few lines in setup.py I'd think.

To be more concrete, what I was thinking was to rename the current files to an underscored version, i.e. chebyshev.py --> _chebyshev.py and then during build do

lines = readlines(_chebyshev.py)
lines.append(exec polytemplate.replace...)
# write lines to chebyshev.py

@charris
Copy link
Member

charris commented Jul 10, 2012

That seems reasonable and should be flexible enough that the template is still usable for other polynomial types. Not sure how this looks in Bento or Numscons.

@charris
Copy link
Member

charris commented Jul 10, 2012

Although the polynomial package will no longer be stand alone. Not sure that that matters.

@rgommers
Copy link
Member

Well, it's still stand alone in that it doesn't depend on the rest of Numpy. Don't think it matters much.

@teoliphant
Copy link
Member

I agree with Chuck that a lot of this diff is white-space and other style changes which should be separate from this PR. My $0.02 is that re-writing at build-time is not any more complicated than we are doing elsewhere and if it helps import times, I'm supportive. Everything I've heard from David tells me that this won't affect Bento at all, but David would know more about that.

This reverts commit d59b142.

Revert style related changes which are orthogonal to the purpose of
this branch.
* This implementation is based on the outline suggested by Ralf
  Gommers in numpy#334
@scottza
Copy link
Contributor Author

scottza commented Jul 11, 2012

I've reverted the commit with the style clean ups and implemented build time generation of the polynomial class modules. My changes shouldn't cause any problems with the Bento build, but I can't confirm that because the Bento build is currently broken (unrelated).

@scottza
Copy link
Contributor Author

scottza commented Jul 11, 2012

Actually, my bad on the Bento build. I think that a bscript implementing the build time generation will be required...

@charris
Copy link
Member

charris commented Jul 11, 2012

I just removed import polynomial from numpy/init.py and things work just fine from my point of view.

base_path = os.path.join('numpy', 'polynomial')

fp = open(os.path.join(base_path, base_file))
lines = fp.readlines()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not fp.read?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No good reason. That would be better.

@rgommers
Copy link
Member

Removing import polynomial probably won't fly, unless we're okay with breaking some user code. Now doing

import numpy as np.
np.polynomial.use_a_func()  

works.

@charris
Copy link
Member

charris commented Jul 11, 2012

True, but I wonder if it's worth it to have it at that level. I put it there originally because that was how other packages were treated, but I wonder if requiring

from numpy.polynomial import Polynomial

Is too much to ask? If we refuse to make these small changes here and there it's hard to get things cleaned up.

@rgommers
Copy link
Member

Indeed, it's hard to clean up. The cost of backwards compatibility I'm afraid. I agree it would have been better to not import it by default. I'm all for more and smaller namespaces anyway (I like np.<tab> not to return a list of 600 things) , but I guess that ship has sailed.

@charris
Copy link
Member

charris commented Jul 11, 2012

Sigh. OK, the easy solution to import times isn't available. Last time I tried to deprecate imports didn't work that well eithe, I couldn't make the deprecation messages show up reliably.

I was guessing that the package wasn't used that much until recently due to the missing documentation, but I think that is starting to change...

@charris
Copy link
Member

charris commented Jul 11, 2012

Maybe the lazy import ideas would also be a good way to deprecate imports.

@rgommers
Copy link
Member

Please no. I wasted so much time hunting down bugs and then removing the lazy imports from scipy. I think this PR is a much better solution.

@charris
Copy link
Member

charris commented Jul 11, 2012

I was thinking 'and', not exclusive 'or' ;) However, it might be useful to have a way to reliably deprecate imports down the road.

@rgommers
Copy link
Member

That would be nice. But still I'd rather not have that than have lazy imports. It will break the most random stuff (like py2exe maybe).

@njsmith
Copy link
Member

njsmith commented Jul 13, 2012

(And for the record, I'd like a better explanation of why lazy importing won't work before we go breaking the API. I understand it caused problems in Scipy, and sure, likely it will cause problems here. But I can't tell how many of those problems were intrinsic to lazy importing, and how many were caused by _import_tools being incredibly convoluted and over-complicated. Certainly both tab-completion and trac bug #1501 seem to be _import_tools-specific problems. It's true py2exe would need some hand-holding to find numpy libraries, but py2exe needs hand-holding every time you sneeze, and has a whole wiki full of examples of how to do it.)

@njsmith
Copy link
Member

njsmith commented Jul 13, 2012

On second thought, this is the wrong place for that discussion anyway. I suggest we leave the 'import .polynomial" out of this PR and take that discussion to the list?

@scottza
Copy link
Contributor Author

scottza commented Jul 13, 2012

@njsmith Hmmm, something odd on my machine then. No errors reported by Tox, but when I install using Python3.2 directly I get the same failure reported by Travisbot:

Traceback (most recent call last):
File "", line 1, in
File "/home/scott/.local/lib/python3.2/site-packages/numpy/init.py", line 153, in
from . import polynomial
File "/home/scott/.local/lib/python3.2/site-packages/numpy/polynomial/init.py", line 18, in
from polynomial import Polynomial
ImportError: No module named polynomial

if sys.version_info[0] >= 3:
rel_import = "from . import"
else:
rel_import = "import"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I bet it's the lack of this code that's causing the import problem on Py3.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That was also my first thought, but I don't understand why and won't be
able to get back to it until the middle of next week.
On Jul 13, 2012 6:09 PM, "njsmith" <
reply@reply.github.com>
wrote:

"""
import string

-import sys

-if sys.version_info[0] >= 3:

  • rel_import = "from . import"
    -else:
  • rel_import = "import"

I bet it's the lack of this code that's causing the import problem on Py3.


Reply to this email directly or view it on GitHub:
https://github.com/numpy/numpy/pull/334/files#r1159923

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tl;dr We're confusing 2to3. Py3 builds of this branch against Git result in failure. Py3 builds against a source dist work fine. Regenerate numpy/polynomial/__init__.py at build time for Py3, or use explicit imports everywhere.


I've just had another look at this. This code removal is unrelated to the Py3 errors. The problem is related to the 2to3 tool.

When a Py3 build is run directly from the repository, the 2to3 tool doesn't refactor the relative imports (e.g. from polynomial import Polynomial) in numpy/polynomial/__init__.py (presumably because the files don't exist at the time 2to3 is run and the tool must therefore assume they are meant to be external imports).

When a Py3 build is run from a source distribution, the numpy/polynomial/polynomial.py etc. files already exist and the relative imports can be correctly generated in numpy/polynomial/__init__.py. This explains why testing with tox works on my machine - the build and test cycle for each Py version is based on a single sdist created by the system python and cached at ~/.tox/distshare/

Travis CI farms out the build tasks and runs the build and test for each Py version against a fresh Git checkout - hence the reported Py3 failures.

To fix, numpy/polynomial/__init__.py also needs to be regenerated at build time for Py3, or use explicit imports from numpy.polynomial.polynomial import ... etc. everywhere.

@rgommers
Copy link
Member

@njsmith: unless I'm completely missing the point here, we're not breaking the API here. All we do is reducing import time, by removing an expensive exec statement. Therefore I fail to see the point of adding lazy imports.

@njsmith
Copy link
Member

njsmith commented Jul 13, 2012

@rgommers: Yeah, that was in response to the discussion between you and @charris about avoiding top-level imports, which I see I didn't read closely enough. I agree it has nothing to do with this pull request.

@travisbot
Copy link

This pull request passes (merged 76b700c into 436a28f).

@scottza
Copy link
Contributor Author

scottza commented Jul 16, 2012

<High fives travisbot>

@njsmith
Copy link
Member

njsmith commented Jul 16, 2012

Looks fine to me now. I'll leave @charris make the final determination though.

@charris
Copy link
Member

charris commented Jul 16, 2012

I'm fine with putting this in, but I think the numscons build should work first. We can probably leave the bento build to David. I suspect David is travelling at the moment...

@cournape David, what do we need to do for numscons and bento?

@scottza
Copy link
Contributor Author

scottza commented Jul 16, 2012

For Bento - I now think that a bento.info file containing at least HookFile: bscript is required in addition to the bscript I outlined earlier, but that's mostly guesswork..

I have zero understanding of Numscons.

@charris
Copy link
Member

charris commented Jul 16, 2012

I'll take a look a numscons if David doesn't chime in.

@charris
Copy link
Member

charris commented Jul 16, 2012

The install puts the generated files in numpy/polynomial instead of in the build directory, that needs to be fixed.

@scottza
Copy link
Contributor Author

scottza commented Jul 16, 2012

Okay. I'll change that.

The current method works, but I see the benefit of being able to revert to
a clean state by just deleting the build directory.

@charris
Copy link
Member

charris commented Jul 16, 2012

Thanks. Looks like it could be a bit of work to figure out where everything goes what with all the different directories that accumulate under build for the different the python versions etc. Somewhere the path gets passed down from the top but I'm not sure how. I've avoided dealing with build stuff over the years.

@travisbot
Copy link

This pull request passes (merged ddaadac into 436a28f).

@scottza
Copy link
Contributor Author

scottza commented Jul 17, 2012

Not tested on Windows/Mac etc..

@charris
Copy link
Member

charris commented Sep 21, 2012

I think this needs to be brought to completion. I'll take a look this weekend.

@charris
Copy link
Member

charris commented Aug 10, 2013

@scottza needs a rebase. Much has changed since the original PR.

@charris
Copy link
Member

charris commented Aug 10, 2013

@scottza In fact, I suggest you redo this PR starting with current master. That is probably the easiest way to clean up the history, and now that you know what you are doing it probably isn't that much work..

@scottza
Copy link
Contributor Author

scottza commented Aug 12, 2013

I'll take a look...

On 10 August 2013 21:07, Charles Harris notifications@github.com wrote:

@scottza https://github.com/scottza In fact, I suggest you redo this PR
starting with current master. That is probably the easiest way to clean up
the history, and now that you know what you are doing it probably isn't
that much work..


Reply to this email directly or view it on GitHubhttps://github.com//pull/334#issuecomment-22445388
.

@scottza
Copy link
Contributor Author

scottza commented Aug 19, 2013

@charris - you're right, it was easier to create a fresh pull request #3639

@scottza scottza closed this Aug 19, 2013
luyahan pushed a commit to plctlab/numpy that referenced this pull request Apr 25, 2024
refactor: Simplify vaddw_* implementations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

7 participants